Given the advantage and recent success of English character-level andsubword-unit models in several NLP tasks, we consider the equivalent modelingproblem for Chinese. Chinese script is logographic and many Chinese logogramsare composed of common substructures that provide semantic, phonetic andsyntactic hints. In this work, we propose to explicitly incorporate the visualappearance of a character's glyph in its representation, resulting in a novelglyph-aware embedding of Chinese characters. Being inspired by the success ofconvolutional neural networks in computer vision, we use them to incorporatethe spatio-structural patterns of Chinese glyphs as rendered in raw pixels. Inthe context of two basic Chinese NLP tasks of language modeling and wordsegmentation, the model learns to represent each character's task-relevantsemantic and syntactic information in the character-level embedding.
展开▼